Dirichlet Mixtures: a Method for Improved Detection of Weak but Signiicant Protein Sequence Homology

نویسندگان

  • Kevin Karplus
  • Michael Brown
  • Richard Hughey
چکیده

We present a method for condensing the information in multiple alignments of proteins into a mixture of Dirichlet densities over amino acid distributions. Dirichlet mixture densities are designed to be combined with observed amino acid frequencies to form estimates of expected amino acid probabilities at each position in a proole, hidden Markov model, or other statistical model. These estimates give a statistical model greater generalization capacity, so that remotely related family members can be more reliably recognized by the model. This paper corrects the previously published formula for estimating these expected probabilities, and contains complete derivations of the Dirichlet mixture formulas, methods for optimizing the mixtures to match particular databases, and suggestions for eecient implementation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology

We present a method for condensing the information in multiple alignments of proteins into a mixture of Dirichlet densities over amino acid distributions. Dirichlet mixture densities are designed to be combined with observed amino acid frequencies to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model or other statistical model. These estimates...

متن کامل

Dirichlet Mixtures A Method for Improving Detection of Weak but Signi cant Protein Sequence Homology

This paper presents the mathematical foundations of Dirichlet mixtures which have been used to improve database search results for homologous sequences when a variable number of sequences from a protein family or domain are known We present a method for condensing the information in a protein database into a mixture of Dirichlet densities These mixtures are designed to be combined with observed...

متن کامل

iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...

متن کامل

Expression analyses of endoglucanase gene in Penicillium oxalicum and Trichoderma viride

The expression of endoglucanase gene and protein profile belonging to two fungal species, Penicillium oxalicum 1SMS and Trichoderma viride 156MS with high cellulase enzyme activity, was investigated. Fungal isolates were cultured on inducer CMC medium and then the amount of released sugar and protein were assayed every three days for a month, using arsenate molybdatereagent and Bradford method,...

متن کامل

Dirichlet Mixtures, the Dirichlet Process, and the Structure of Protein Space

The Dirichlet process is used to model probability distributions that are mixtures of an unknown number of components. Amino acid frequencies at homologous positions within related proteins have been fruitfully modeled by Dirichlet mixtures, and we use the Dirichlet process to derive such mixtures with an unbounded number of components. This application of the method requires several technical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996